You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version
 Read Kafka Topic
						(Kafka Connector)
Read Kafka Topic
						(Kafka Connector)
					
        
        Synopsis
This operator reads the messages from kafka topic on a specific Kafka cluster.Description
It can either retrieve all previous messages available on this topic, or can collect new incoming messages. New messages are either collected for a specified amount of time or until a specific number of messages are retrieved.
Input
 connection (Connection) connection (Connection)- The connection to the Kafka server, from where the messages are read. 
Output
 out (Data Table) out (Data Table)- The ExampleSet with the collected messages. 
Parameters
- kafka_topic
                The name of the Kafka topic which should be read. Range:
- update_topics
                Try to a retrieve list of available topics from server. Range:
- offset_strategy
                The polling strategy for the topic. - earliest: Messages are retrieved beginning the earliest available messages
- latest: Only new incoming messages are collected
 
- retrieval_time_out
                Time out when retrieving old messages. Typically relatively short, unless retrieving millions of records. Only applicable if the offset strategy is set to earliest. Range:
- collection_strategy
                The strategy to collect new messages. It's either by duration, meaning the operator will wait and collect all new messages incoming in the next n seconds or number, meaning it waits until n messages are retrieved. - duration: The operator will wait and collect all new messages incoming in the next n seconds
- number: The operator will wait until n messages are retrieved
 
- counter
                Counter for the collection strategy. It's either the duration in seconds the operator to wait or the number of messages to collect. Range:
- time_out
                If the collection strategy is number this is an additional time out, to prevent the operator waiting too long until enough messages are retrieved, for example in case the message producer is inactive. Range:
- polling_time_out The time out for each individual poll to the kafka cluster. Increase this value if the connection has a high latency and you experience lost messages. Range:
Tutorial Processes
Train and Apply Clustering on data from Kafka Topic
In this tutorial process the usage of the Read Kafka Topic operator is demonstrated.
